Random, but not so much 
A parameterization for the returns and correlation matrix of financial time series 
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A parameterization that is a modified version of a previous work is proposed for the returns and 
correlation matrix of financial time series and its properties are studied. This parameterization 
allows easy introduction of non-stationarity and it shows several of the characteristics of the true, 
observed realizations, such as fat tails, volatility clustering, and a spectrum of eigenvalues of the 
correlation matrix that can be understood as an extension of Random Matrix Theory results. The 
predicted behavior of this parameterization for the eigenvalues is compared with the eigenvalues 
of Brazilian assets and it is shown that those predictions fit the data better than Random Matrix 
Theory. 
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I. INTRODUCTION 

The problem of determining the correct structure of 
the correlation matrix is an important one in several dif- 
ferent applications, and the methods of Random Matrix 
Theory (RMT) 0, Q have been successfully applied to 
problems in many areas, such as magnetic resonance im- 
ages Q , Meteorology [ij , and financial time series @] . 

The correct estimation of the correlations in Finance is 
a fundamental step in portfolio choice Q. The observa- 
tion that most of the eigenvalues of the correlation matrix 
can be due to noise, therefore, can have important con- 
sequences and a model that provides that structure can 
be a very useful tool in Finance as well as in other areas. 
RMT does not claim to explain all the eigenvalue spec- 
trum of financial time series, since a few large eigenvalues 
remain outside its scope. Also, a number of results have 
been observed that are not in perfect agreement with 
RMT, such as the observation that noise eigenvalues seem 
to be a little larger than expected Q and that correla- 
tions can be measured in the supposedly random part of 
the eigenvalue spectrum [l(| ■ It has also been verified 
different behaviors of the eigenvalues corresponding to 
different points of time, suggesting that non-stationary 
effects might play an important role [ill, E3] • 

The role of non-stationarity on the eigenvalue spec- 
trum of the correlation matrix was recently studied and 
it has been found, by using a model where most eigen- 
values are zero in the stationary region, that the non- 
stationarity can be the cause for the several of the eigen- 
values corresponding to the bulk region of the spec- 
trum (l3| . Here, that model will be altered, by introduc- 
ing random components to the stationary regime. Such 
an extension will provide a parameterization of the prob- 
lem where several of the stylized facts about financial 
series will be observed. Simulations of the model will 
show that the Marcenko-Pastur (MP) distribution [l4| 
can be recovered as a limit case for the bulk eigenvalues 
of the model when more random components are added. 



The model also allows the introduction of non-bulk, large 
eigenvalues in the correlation matrix and, therefore, it 
can be seen as an extension of the results of Random 
Matrix Theory. 



II. THE MODEL 

In the original model [jjj) . the returns \ii and the 
correlation matrix Pa, where both i = 1, ••• ,N and 
I = 1, • • • ,N refer to the assets, were obtained from a 
N x M matrix that could be a function of the time t, 
<fr(t). The matrix 4? components ipij, where i = 1, • • • , N 
represents the different assets and where each value of j, 
j = 1, • • • , M, M > 3, can be seen as a collection of M 
vectors ip, each with N components. Each one of those 
vectors represents a possible, typical state of the system. 
Given <fr, the average return vector \x and the covariance 
matrix S and the correlation matrix P will be given by 



1 M 

Mi = E fa] = jjY, 



<Pij 



1 M 



(1) 



The observed returns r t , at instant t, are generated, as 
usual, by a multivariate normal N(p, S) likelihood. 

In this article, a simple, but powerful extension of this 
model is proposed. Instead of having a matrix $ com- 
posed of M > 3 vectors, each consisting of parameters 
to be estimated in order to adjust the model, $ will be 
composed of M + S > 3 vectors. The first M vectors 
play the same role as before fl3| . while we have S new 
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pseudo-parameters, that are actually randomly drawn at 
each instant of time (even though the S new vectors are 
not real parameters of the model, since they will be gen- 
erated randomly, they will be referred to, from now on, 
as random parameters). With the introduction of the 
random parameters, all sums in the Equation Q] are to be 
performed now on from 1 to AI + S. This introduces a 
random element to the model that will cause the return 
vector and correlation matrix to change in time, even in 
the stationary case where each of the ipij elements are 
held constant (at least, for finite values of S). In or- 
der to preserve the variance associated with each return, 
the random parameters will follow a normal distribution 
N(0, T,a) for each asset i. 

One nice feature of the original model is that, by mak- 
ing each of the components (fij follow a random walk, 
this generates a non-stationary correlation matrix, with 
all its properties automatically respected. A simple way 
to model that is by choosing ifij(t + 1) = tfiij(t) + <J e - 
However, for long periods of time, this causes the vari- 
ance to explode. This is not a problem if one is interested 
only in the correlation, but, here, the time behavior of 
the returns will also be investigated. Therefore, a mean- 
reversion term will be introduced to the random walk, 
that is 



Eigenvalues for N=38, o- p =0.02 



tpij(t + 1) = (1 - a)<pij(t) + i 



(2) 



where a is a small number that measures the strength 
of the mean-reversal process (a = corresponds to no 
mean- reversal). The effect of this term is negligible for 
small periods of time as long as a is small enough fl5j|. 



III. RESULTS 

Simulations were performed for the proposed model 
in order to compare it to real data as well as with the 
Marcenko-Pastur distribution [3]. The real data corre- 
sponds to the returns of N — 38 Brazilian stocks, ob- 
served daily from January, 5th, 2004 to July, 28th, 2006, 
for a total of T = 644 observations. Figure Q] shows the 
behavior of the model for different values of S as a distri- 
bution obtained from the histogram of simulated results 
when a e = 0.02 (the behavior for er c = 0.0 is visually 
almost identical, with a slightly worse fit, and, therefore, 
it is not shown here). Notice that the Marcenko-Pastur 
distribution fails to describe the real data, since we are 
in a finite case, away from the limits where it is expected 
to be valid. On the other hand, the model here proposed 
does a much better job, if S is chosen to be 2 or 3. For 
the simulated results, the two largest eigenvalues are not 
shown, since they are outside the bulk of random eigen- 
values (15.1 and 6, for S = 2 and 11.9 and 6, for S = 3). 
That means that the model not only describes better the 
observed eigenvalues in the bulk region, but it also gen- 
erates non-bulk eigenvalues (the real data has one large 
eigenvalue of 16.2). 
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FIG. 1: Observed eigenvalues for N = 38 Brazilian assets, 
compared with the simulated results for different values of 
s, in the non-stationary case (cr e = 0.02) as well as the MP 
distribution. 



Another interesting feature that can be observed in 
Figure [1] is that, as S gets larger, the predicted distribu- 
tion seems to get closer to the MP distribution. This is 
actually to be expected. If the M is kept constant, the 
influence of the real vector parameters in the covariance 
matrix becomes weaker as S grows. For large S, the prob- 
lem tends to a simple sampling problem and the correla- 
tion matrix is obtained from a basically random matrix, 
therefore the agreement with RMT results. Since N = 38 
is a small number of assets for a good visualization, sim- 
ulations were run with TV = 200, in order to observe the 
convergence towards the MP distribution. Those results 
can be observed in Figure [U 

For S = 0, the stationary case corresponding to the re- 
sults shown in Figure^ has only exactly zero eigenvalues, 
that is, there is only one large peak in the distribution 
at A = 0. As S grows, the simulated distributions ap- 
proaches reasonably fast the MP distribution as can be 
seen from a reasonable approximation for S = 5 and an 
almost exact match when S = 20. It is also interesting to 
notice that, although the non-bulk eigenvalues still sur- 
vive, they are smaller as S grows. This happens because 
M was kept constant and, therefore, less important for 
larger values of S. 

This means that, while M is related to the large eigen- 
values, S can be seen as a parameter that measures how 
close to a random matrix the real data really is, as op- 
posed to a simpler model where only the main eigenval- 
ues exist. In that sense, this model provides an extension 
of RMT results to cases where the randomization is not 
complete. It also accounts for the largest observed eigen- 
values and, therefore, provides a better fit to real data 
than RMT. 
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Eigenvalue distribution for N=200 




FIG. 2: Simulated eigenvalues for N = 200, for different val- 
ues of s, in the stationary case (a c = 0.0) compared with the 
MP distribution. 



Another interesting feature of the simulated time se- 
ries is the possibility of studying non-stationarity in the 
covariance matrix and the returns. In order to observe 
the long run behavior, Equation [2] was used to generate 
a mean-reversing random walk in the parameters. Fig- 



ure [3] show the results for a run with 2 16 time observa- 
tions of N = 5 assets, with M = 2 and 5=1. The 
non-stationarity parameters were chosen as a = 0.001 
and a e = 0.02. 

It is easy to see the volatility clustering in the time 
series. Two effects are actually responsible for that; the 
random walk of the tp^j real parameters as well as a less 
important, but existent effect of the random parame- 
ters. That happens because, if the S random parame- 
ters are randomly drawn larger than expected once, this 
will cause the variance at that point in time to increase, 
making more likely to observe larger random parameters 
in the next time period. 

That is, we have seen that the introduction of random 
parameters has allowed the proposed model to expand 
the results of RMT. The resulting model presents a few 
large eigenvalues (chosen by M), a distribution for the 
bulk eigenvalues that can be made to fit the data better 
than RMT and made to converge to RMT (by a proper 
choice of S) , if necessary, an easy way to introduce non- 
stationary in returns and in the covariance matrix, and 
it also shows volatility clustering. Finally, as noted in 
the original model (l3l | , even though normal distributions 
were used throughout the article, all the observed time 
series also show an increased kurtosis (except for a e — 
and S = or as S — > oo). This effect diminishes as 
S grows, since that limit corresponds to a traditional 
random matrix, but it is important for the smaller values 
of S that seem to correspond to real problems. 
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Simulated Observed Returns 




FIG. 3: Simulated returns for N — 5 assets, with o e = 0.02 
and mean reversion given by a — 0.001. 



